Goto

Collaborating Authors

 minimax theory


Minimax Theory for High-dimensional Gaussian Mixtures with Sparse Mean Separation

Neural Information Processing Systems

While several papers have investigated computationally and statistically efficient methods for learning Gaussian mixtures, precise minimax bounds for their statistical performance as well as fundamental limits in high-dimensional settings are not well-understood. In this paper, we provide precise information theoretic bounds on the clustering accuracy and sample complexity of learning a mixture of two isotropic Gaussians in high dimensions under small mean separation. If there is a sparse subset of relevant dimensions that determine the mean separation, then the sample complexity only depends on the number of relevant dimensions and mean separation, and can be achieved by a simple computationally efficient procedure. Our results provide the first step of a theoretical basis for recent methods that combine feature selection and clustering.


Reviews: More Supervision, Less Computation: Statistical-Computational Tradeoffs in Weakly Supervised Learning

Neural Information Processing Systems

This paper is interesting and deals with new kind of results introducing computational aspects in standard minimax theory. The phenomenon illustrated is new to me, and present some limitation of the computationally tractable algorithm w.r.t. "theoretical" ones that could be considered in the classical minimax theory. However, due to the relative novelty of the framework, it would be important that basic definitions and properties be better presented. In the following there is only one model investigated.


Reviews: Higher-Order Total Variation Classes on Grids: Minimax Theory and Trend Filtering Methods

Neural Information Processing Systems

This paper considers graph-structured signal denoising problem. The particular structure enforced involves a total variation type penalty involving higher order discrete derivatives. Optimal rates for penalized least-squares estimator is established and minimax lower bound is established. For the grid filtering case the upper bounds are established under an assumed conjecture. The paper is well written.


Minimax Theory for High-dimensional Gaussian Mixtures with Sparse Mean Separation

Neural Information Processing Systems

While several papers have investigated computationally and statistically efficient methods for learning Gaussian mixtures, precise minimax bounds for their statistical performance as well as fundamental limits in high-dimensional settings are not well-understood. In this paper, we provide precise information theoretic bounds on the clustering accuracy and sample complexity of learning a mixture of two isotropic Gaussians in high dimensions under small mean separation. If there is a sparse subset of relevant dimensions that determine the mean separation, then the sample complexity only depends on the number of relevant dimensions and mean separation, and can be achieved by a simple computationally efficient procedure. Our results provide the first step of a theoretical basis for recent methods that combine feature selection and clustering. Papers published at the Neural Information Processing Systems Conference.


Higher-Order Total Variation Classes on Grids: Minimax Theory and Trend Filtering Methods

Neural Information Processing Systems

We consider the problem of estimating the values of a function over $n$ nodes of a $d$-dimensional grid graph (having equal side lengths $n {1/d}$) from noisy observations. The function is assumed to be smooth, but is allowed to exhibit different amounts of smoothness at different regions in the grid. Meanwhile, total variation (TV) smoothness classes allow for heterogeneity, but are restrictive in another sense: only constant functions count as perfectly smooth (achieve zero TV). To move past this, we define two new higher-order TV classes, based on two ways of compiling the discrete derivatives of a parameter across the nodes. We relate these two new classes to Holder classes, and derive lower bounds on their minimax errors. We also analyze two naturally associated trend filtering methods; when $d 2$, each is seen to be rate optimal over the appropriate class.